speech database
PRODIS -- a speech database and a phoneme-based language model for the study of predictability effects in Polish
Malisz, Zofia, Foremski, Jan, Kul, Małgorzata
We present a speech database and a phoneme-level language model of Polish. The database and model are designed for the analysis of prosodic and discourse factors and their impact on acoustic parameters in interaction with predictability effects. The database is also the first large, publicly available Polish speech corpus of excellent acoustic quality that can be used for phonetic analysis and training of multi-speaker speech technology systems. The speech in the database is processed in a pipeline that achieves a 90% degree of automation. It incorporates state-of-the-art, freely available tools enabling database expansion or adaptation to additional languages.
- Europe > Poland > Greater Poland Province > Poznań (0.05)
- Europe > Sweden > Stockholm > Stockholm (0.05)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (2 more...)
Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
Ting, Zhu, Liangqi, Li, Shufei, Duan, Xueying, Zhang, Zhongzhe, Xiao, Hairng, Jia, Huizhi, Liang
A multi-modal emotional speech Mandarin database including articulatory kinematics, acoustics, glottal and facial micro-expressions is designed and established, which is described in detail from the aspects of corpus design, subject selection, recording details and data processing. Where signals are labeled with discrete emotion labels (neutral, happy, pleasant, indifferent, angry, sad, grief) and dimensional emotion labels (pleasure, arousal, dominance). In this paper, the validity of dimension annotation is verified by statistical analysis of dimension annotation data. The SCL-90 scale data of annotators are verified and combined with PAD annotation data for analysis, so as to explore the internal relationship between the outlier phenomenon in annotation and the psychological state of annotators. In order to verify the speech quality and emotion discrimination of the database, this paper uses 3 basic models of SVM, CNN and DNN to calculate the recognition rate of these seven emotions. The results show that the average recognition rate of seven emotions is about 82% when using acoustic data alone. When using glottal data alone, the average recognition rate is about 72%. Using kinematics data alone, the average recognition rate also reaches 55.7%. Therefore, the database is of high quality and can be used as an important source for speech analysis research, especially for the task of multimodal emotional speech analysis.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Education (0.93)
Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi
Kumar, Ritesh, Singh, Siddharth, Ratan, Shyam, Raj, Mohit, Sinha, Sonal, Lahiri, Bornini, Seshadri, Vivek, Bali, Kalika, Ojha, Atul Kr.
In this paper we discuss an in-progress work on the development of a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi using the field methods of linguistic data collection. The total size of the corpus currently stands at approximately 18 hours (approx. 4-5 hours each language) and it is transcribed and annotated with grammatical information such as part-of-speech tags, morphological features and Universal dependency relationships. We discuss our methodology for data collection in these languages, most of which was done in the middle of the COVID-19 pandemic, with one of the aims being to generate some additional income for low-income groups speaking these languages. In the paper, we also discuss the results of the baseline experiments for automatic speech recognition system in these languages.
- Asia > Indonesia > Bali (0.05)
- Asia > India > West Bengal > Kharagpur (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
Mihajlik, P., Balog, A., Gráczi, T. E., Kohári, A., Tarján, B., Mády, K.
Hungarian is spoken by 15 million people, still, easily accessible Automatic Speech Recognition (ASR) benchmark datasets - especially for spontaneous speech - have been practically unavailable. In this paper, we introduce BEA-Base, a subset of the BEA spoken Hungarian database comprising mostly spontaneous speech of 140 speakers. It is built specifically to assess ASR, primarily for conversational AI applications. After defining the speech recognition subsets and task, several baselines - including classic HMM-DNN hybrid and end-to-end approaches augmented by cross-language transfer learning - are developed using open-source toolkits. The best results obtained are based on multilingual self-supervised pretraining, achieving a 45% recognition error rate reduction as compared to the classical approach - without the application of an external language model or additional supervised data. The results show the feasibility of using BEA-Base for training and evaluation of Hungarian speech recognition systems.
What Is Natural Language Processing and How Does It Work? - Text2Speech Blog
In 1950, Alan Turing published his famous paper titled "Computing Machinery and Intelligence". The paper proposed a test to determine if a machine was artificially intelligent. Basically, Turing said that if a machine could have a conversation with a human and trick the human into thinking the machine was a person itself, then it was artificially intelligent. This became known as the Turing Test, and passing it has been one of the most sought after goals in computer science. Passing the Turing Test would signal the birth of artificial intelligence.